Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells439431
Missing cells (%)8.2%8.1%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh Correlation
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh Correlation
Age has 90 (20.2%) missing values Age has 84 (18.8%) missing values Missing
Cabin has 347 (77.8%) missing values Cabin has 346 (77.6%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 307 (68.8%) zeros SibSp has 309 (69.3%) zeros Zeros
Parch has 338 (75.8%) zeros Parch has 343 (76.9%) zeros Zeros
Fare has 8 (1.8%) zeros Fare has 11 (2.5%) zeros Zeros
Alert not present in this datasetSibSp is highly overall correlated with ParchHigh Correlation
Alert not present in this datasetParch is highly overall correlated with SibSpHigh Correlation

Reproduction

 Dataset ADataset B
Analysis started2023-09-12 08:35:18.7003982023-09-12 08:35:23.583917
Analysis finished2023-09-12 08:35:23.5824762023-09-12 08:35:27.435033
Duration4.88 seconds3.85 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean430.83857452.16592
 Dataset ADataset B
Minimum11
Maximum891891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:35:27.615663image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum11
5-th percentile42.2551.5
Q1199.25244.25
median422448.5
Q3659678
95-th percentile842.75838.75
Maximum891891
Range890890
Interquartile range (IQR)459.75433.75

Descriptive statistics

 Dataset ADataset B
Standard deviation261.04351253.40065
Coefficient of variation (CV)0.605896340.5604152
Kurtosis-1.2239694-1.1747123
Mean430.83857452.16592
Median Absolute Deviation (MAD)228.5217.5
Skewness0.063354926-0.036099735
Sum192154201666
Variance68143.71364211.892
MonotonicityNot monotonicNot monotonic
2023-09-12T09:35:27.869768image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
56 1
 
0.2%
405 1
 
0.2%
678 1
 
0.2%
433 1
 
0.2%
341 1
 
0.2%
573 1
 
0.2%
131 1
 
0.2%
518 1
 
0.2%
526 1
 
0.2%
713 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
264 1
 
0.2%
370 1
 
0.2%
624 1
 
0.2%
303 1
 
0.2%
703 1
 
0.2%
367 1
 
0.2%
477 1
 
0.2%
871 1
 
0.2%
507 1
 
0.2%
697 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
5 1
0.2%
7 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
16 1
0.2%
17 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
5 1
0.2%
7 1
0.2%
9 1
0.2%
11 1
0.2%
14 1
0.2%
17 1
0.2%
18 1
0.2%
20 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
5 1
0.2%
7 1
0.2%
9 1
0.2%
11 1
0.2%
14 1
0.2%
17 1
0.2%
18 1
0.2%
20 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
5 1
0.2%
7 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
16 1
0.2%
17 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
274 
1
172 
0
268 
1
178 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row10
2nd row00
3rd row11
4th row11
5th row10

Common Values

ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Length

2023-09-12T09:35:28.052476image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-09-12T09:35:28.184339image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:28.301651image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Most occurring characters

ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 274
61.4%
1 172
38.6%
ValueCountFrequency (%)
0 268
60.1%
1 178
39.9%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
246 
1
104 
2
96 
3
243 
1
107 
2
96 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row11
2nd row31
3rd row23
4th row31
5th row13

Common Values

ValueCountFrequency (%)
3 246
55.2%
1 104
23.3%
2 96
 
21.5%
ValueCountFrequency (%)
3 243
54.5%
1 107
24.0%
2 96
 
21.5%

Length

2023-09-12T09:35:28.429917image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-09-12T09:35:28.558045image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:28.682972image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
3 246
55.2%
1 104
23.3%
2 96
 
21.5%
ValueCountFrequency (%)
3 243
54.5%
1 107
24.0%
2 96
 
21.5%

Most occurring characters

ValueCountFrequency (%)
3 246
55.2%
1 104
23.3%
2 96
 
21.5%
ValueCountFrequency (%)
3 243
54.5%
1 107
24.0%
2 96
 
21.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 446
100.0%
ValueCountFrequency (%)
Decimal Number 446
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 246
55.2%
1 104
23.3%
2 96
 
21.5%
ValueCountFrequency (%)
3 243
54.5%
1 107
24.0%
2 96
 
21.5%

Most occurring scripts

ValueCountFrequency (%)
Common 446
100.0%
ValueCountFrequency (%)
Common 446
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3 246
55.2%
1 104
23.3%
2 96
 
21.5%
ValueCountFrequency (%)
3 243
54.5%
1 107
24.0%
2 96
 
21.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 446
100.0%
ValueCountFrequency (%)
ASCII 446
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 246
55.2%
1 104
23.3%
2 96
 
21.5%
ValueCountFrequency (%)
3 243
54.5%
1 107
24.0%
2 96
 
21.5%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:35:29.076436image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length6757
Median length4848
Mean length26.70403626.784753
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1191011946
Distinct characters6060
Distinct categories77 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowWoolner, Mr. HughHarrison, Mr. William
2nd rowJohanson, Mr. Jakob AlfredSmith, Mr. Richard William
3rd rowRichards, Master. William RoweHakkarainen, Mrs. Pekka Pietari (Elin Matilda Dolck)
4th rowOsman, Mrs. MaraGraham, Miss. Margaret Edith
5th rowCarter, Mr. William ErnestO'Brien, Mr. Timothy
ValueCountFrequency (%)
mr 256
 
14.2%
miss 104
 
5.8%
mrs 58
 
3.2%
william 32
 
1.8%
john 25
 
1.4%
master 21
 
1.2%
henry 18
 
1.0%
mary 13
 
0.7%
james 13
 
0.7%
george 11
 
0.6%
Other values (893) 1250
69.4%
ValueCountFrequency (%)
mr 269
 
14.8%
miss 84
 
4.6%
mrs 65
 
3.6%
william 38
 
2.1%
john 23
 
1.3%
henry 17
 
0.9%
master 16
 
0.9%
charles 15
 
0.8%
george 15
 
0.8%
james 13
 
0.7%
Other values (885) 1259
69.4%
2023-09-12T09:35:29.775785image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1356
 
11.4%
r 957
 
8.0%
e 853
 
7.2%
a 820
 
6.9%
s 665
 
5.6%
n 661
 
5.5%
i 655
 
5.5%
M 575
 
4.8%
l 508
 
4.3%
o 496
 
4.2%
Other values (50) 4364
36.6%
ValueCountFrequency (%)
1368
 
11.5%
r 977
 
8.2%
e 849
 
7.1%
a 806
 
6.7%
n 673
 
5.6%
i 661
 
5.5%
s 614
 
5.1%
M 562
 
4.7%
l 530
 
4.4%
o 501
 
4.2%
Other values (50) 4405
36.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 7651
64.2%
Uppercase Letter 1819
 
15.3%
Space Separator 1356
 
11.4%
Other Punctuation 950
 
8.0%
Close Punctuation 62
 
0.5%
Open Punctuation 62
 
0.5%
Dash Punctuation 10
 
0.1%
ValueCountFrequency (%)
Lowercase Letter 7671
64.2%
Uppercase Letter 1823
 
15.3%
Space Separator 1368
 
11.5%
Other Punctuation 941
 
7.9%
Open Punctuation 69
 
0.6%
Close Punctuation 69
 
0.6%
Dash Punctuation 5
 
< 0.1%

Most frequent character per category

Space Separator
ValueCountFrequency (%)
1356
100.0%
ValueCountFrequency (%)
1368
100.0%
Lowercase Letter
ValueCountFrequency (%)
r 957
12.5%
e 853
11.1%
a 820
10.7%
s 665
8.7%
n 661
8.6%
i 655
8.6%
l 508
 
6.6%
o 496
 
6.5%
t 337
 
4.4%
h 244
 
3.2%
Other values (16) 1455
19.0%
ValueCountFrequency (%)
r 977
12.7%
e 849
11.1%
a 806
10.5%
n 673
8.8%
i 661
8.6%
s 614
8.0%
l 530
 
6.9%
o 501
 
6.5%
t 338
 
4.4%
h 276
 
3.6%
Other values (16) 1446
18.9%
Uppercase Letter
ValueCountFrequency (%)
M 575
31.6%
A 127
 
7.0%
J 112
 
6.2%
H 106
 
5.8%
C 95
 
5.2%
S 93
 
5.1%
E 74
 
4.1%
W 71
 
3.9%
B 64
 
3.5%
L 61
 
3.4%
Other values (15) 441
24.2%
ValueCountFrequency (%)
M 562
30.8%
A 122
 
6.7%
J 112
 
6.1%
H 100
 
5.5%
S 92
 
5.0%
E 88
 
4.8%
C 87
 
4.8%
W 83
 
4.6%
G 69
 
3.8%
B 61
 
3.3%
Other values (15) 447
24.5%
Other Punctuation
ValueCountFrequency (%)
, 446
46.9%
. 446
46.9%
" 54
 
5.7%
' 3
 
0.3%
/ 1
 
0.1%
ValueCountFrequency (%)
, 446
47.4%
. 446
47.4%
" 44
 
4.7%
' 4
 
0.4%
/ 1
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 62
100.0%
ValueCountFrequency (%)
) 69
100.0%
Open Punctuation
ValueCountFrequency (%)
( 62
100.0%
ValueCountFrequency (%)
( 69
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 10
100.0%
ValueCountFrequency (%)
- 5
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 9470
79.5%
Common 2440
 
20.5%
ValueCountFrequency (%)
Latin 9494
79.5%
Common 2452
 
20.5%

Most frequent character per script

Common
ValueCountFrequency (%)
1356
55.6%
, 446
 
18.3%
. 446
 
18.3%
) 62
 
2.5%
( 62
 
2.5%
" 54
 
2.2%
- 10
 
0.4%
' 3
 
0.1%
/ 1
 
< 0.1%
ValueCountFrequency (%)
1368
55.8%
, 446
 
18.2%
. 446
 
18.2%
( 69
 
2.8%
) 69
 
2.8%
" 44
 
1.8%
- 5
 
0.2%
' 4
 
0.2%
/ 1
 
< 0.1%
Latin
ValueCountFrequency (%)
r 957
 
10.1%
e 853
 
9.0%
a 820
 
8.7%
s 665
 
7.0%
n 661
 
7.0%
i 655
 
6.9%
M 575
 
6.1%
l 508
 
5.4%
o 496
 
5.2%
t 337
 
3.6%
Other values (41) 2943
31.1%
ValueCountFrequency (%)
r 977
 
10.3%
e 849
 
8.9%
a 806
 
8.5%
n 673
 
7.1%
i 661
 
7.0%
s 614
 
6.5%
M 562
 
5.9%
l 530
 
5.6%
o 501
 
5.3%
t 338
 
3.6%
Other values (41) 2983
31.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 11910
100.0%
ValueCountFrequency (%)
ASCII 11946
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1356
 
11.4%
r 957
 
8.0%
e 853
 
7.2%
a 820
 
6.9%
s 665
 
5.6%
n 661
 
5.5%
i 655
 
5.5%
M 575
 
4.8%
l 508
 
4.3%
o 496
 
4.2%
Other values (50) 4364
36.6%
ValueCountFrequency (%)
1368
 
11.5%
r 977
 
8.2%
e 849
 
7.1%
a 806
 
6.7%
n 673
 
5.6%
i 661
 
5.5%
s 614
 
5.1%
M 562
 
4.7%
l 530
 
4.4%
o 501
 
4.2%
Other values (50) 4405
36.9%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
284 
female
162 
male
295 
female
151 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.72645744.67713
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21082086
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalemale
2nd rowmalemale
3rd rowmalefemale
4th rowfemalefemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 284
63.7%
female 162
36.3%
ValueCountFrequency (%)
male 295
66.1%
female 151
33.9%

Length

2023-09-12T09:35:29.993688image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-09-12T09:35:30.141929image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:30.259443image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
male 284
63.7%
female 162
36.3%
ValueCountFrequency (%)
male 295
66.1%
female 151
33.9%

Most occurring characters

ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 597
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 151
 
7.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 2108
100.0%
ValueCountFrequency (%)
Lowercase Letter 2086
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 597
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 151
 
7.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 2108
100.0%
ValueCountFrequency (%)
Latin 2086
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 597
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 151
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2108
100.0%
ValueCountFrequency (%)
ASCII 2086
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 608
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 162
 
7.7%
ValueCountFrequency (%)
e 597
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 151
 
7.2%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7670
Distinct (%)21.3%19.3%
Missing9084
Missing (%)20.2%18.8%
Infinite00
Infinite (%)0.0%0.0%
Mean29.74320230.995387
 Dataset ADataset B
Minimum0.750.67
Maximum7480
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:35:30.566307image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.750.67
5-th percentile44.05
Q120.7521
median2830
Q33839
95-th percentile56.2558.95
Maximum7480
Range73.2579.33
Interquartile range (IQR)17.2518

Descriptive statistics

 Dataset ADataset B
Standard deviation14.93417214.914041
Coefficient of variation (CV)0.502103720.48116971
Kurtosis0.127807320.17071811
Mean29.74320230.995387
Median Absolute Deviation (MAD)89
Skewness0.43126310.37901609
Sum10588.5811220.33
Variance223.02951222.42863
MonotonicityNot monotonicNot monotonic
2023-09-12T09:35:30.811202image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24 20
 
4.5%
21 16
 
3.6%
22 16
 
3.6%
28 15
 
3.4%
36 12
 
2.7%
18 12
 
2.7%
30 12
 
2.7%
31 11
 
2.5%
16 11
 
2.5%
32 11
 
2.5%
Other values (66) 220
49.3%
(Missing) 90
20.2%
ValueCountFrequency (%)
30 16
 
3.6%
28 15
 
3.4%
24 15
 
3.4%
19 13
 
2.9%
36 13
 
2.9%
32 12
 
2.7%
22 12
 
2.7%
18 12
 
2.7%
34 11
 
2.5%
25 11
 
2.5%
Other values (60) 232
52.0%
(Missing) 84
 
18.8%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.83 1
 
0.2%
1 4
0.9%
2 6
1.3%
3 4
0.9%
4 4
0.9%
5 3
0.7%
6 2
 
0.4%
8 3
0.7%
9 5
1.1%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 2
 
0.4%
0.83 2
 
0.4%
1 1
 
0.2%
2 6
1.3%
3 3
0.7%
4 4
0.9%
5 3
0.7%
7 1
 
0.2%
8 2
 
0.4%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 2
 
0.4%
0.83 2
 
0.4%
1 1
 
0.2%
2 6
1.3%
3 3
0.7%
4 4
0.9%
5 3
0.7%
7 1
 
0.2%
8 2
 
0.4%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.83 1
 
0.2%
1 4
0.9%
2 6
1.3%
3 4
0.9%
4 4
0.9%
5 3
0.7%
6 2
 
0.4%
8 3
0.7%
9 5
1.1%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.576233180.47309417
 Dataset ADataset B
Minimum00
Maximum88
Zeros307309
Zeros (%)68.8%69.3%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:35:30.987141image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile32
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.28159671.0180143
Coefficient of variation (CV)2.22409382.1518217
Kurtosis16.78861622.019084
Mean0.576233180.47309417
Median Absolute Deviation (MAD)00
Skewness3.73837694.0207869
Sum257211
Variance1.642491.0363531
MonotonicityNot monotonicNot monotonic
2023-09-12T09:35:31.129909image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 307
68.8%
1 96
 
21.5%
2 15
 
3.4%
4 10
 
2.2%
3 10
 
2.2%
8 7
 
1.6%
5 1
 
0.2%
ValueCountFrequency (%)
0 309
69.3%
1 107
 
24.0%
2 11
 
2.5%
3 8
 
1.8%
4 6
 
1.3%
8 3
 
0.7%
5 2
 
0.4%
ValueCountFrequency (%)
0 307
68.8%
1 96
 
21.5%
2 15
 
3.4%
3 10
 
2.2%
4 10
 
2.2%
5 1
 
0.2%
8 7
 
1.6%
ValueCountFrequency (%)
0 309
69.3%
1 107
 
24.0%
2 11
 
2.5%
3 8
 
1.8%
4 6
 
1.3%
5 2
 
0.4%
8 3
 
0.7%
ValueCountFrequency (%)
0 309
69.3%
1 107
 
24.0%
2 11
 
2.5%
3 8
 
1.8%
4 6
 
1.3%
5 2
 
0.4%
8 3
 
0.7%
ValueCountFrequency (%)
0 307
68.8%
1 96
 
21.5%
2 15
 
3.4%
3 10
 
2.2%
4 10
 
2.2%
5 1
 
0.2%
8 7
 
1.6%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct67
Distinct (%)1.3%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.378923770.36995516
 Dataset ADataset B
Minimum00
Maximum56
Zeros338343
Zeros (%)75.8%76.9%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:35:31.270420image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum56
Range56
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.783196970.82374508
Coefficient of variation (CV)2.06689852.2266079
Kurtosis8.725340812.42968
Mean0.378923770.36995516
Median Absolute Deviation (MAD)00
Skewness2.60515053.0986322
Sum169165
Variance0.613397490.67855595
MonotonicityNot monotonicNot monotonic
2023-09-12T09:35:31.414613image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 338
75.8%
1 60
 
13.5%
2 42
 
9.4%
5 3
 
0.7%
3 2
 
0.4%
4 1
 
0.2%
ValueCountFrequency (%)
0 343
76.9%
1 61
 
13.7%
2 33
 
7.4%
5 3
 
0.7%
3 3
 
0.7%
4 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 338
75.8%
1 60
 
13.5%
2 42
 
9.4%
3 2
 
0.4%
4 1
 
0.2%
5 3
 
0.7%
ValueCountFrequency (%)
0 343
76.9%
1 61
 
13.7%
2 33
 
7.4%
3 3
 
0.7%
4 2
 
0.4%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 343
76.9%
1 61
 
13.7%
2 33
 
7.4%
3 3
 
0.7%
4 2
 
0.4%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 338
75.8%
1 60
 
13.5%
2 42
 
9.4%
3 2
 
0.4%
4 1
 
0.2%
5 3
 
0.7%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct372383
Distinct (%)83.4%85.9%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:35:31.918636image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.83408076.8632287
Min length43

Characters and Unicode

 Dataset ADataset B
Total characters30483061
Distinct characters3235
Distinct categories55 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique321337 ?
Unique (%)72.0%75.6%

Sample

 Dataset ADataset B
1st row19947112059
2nd row3101264113056
3rd row29106STON/O2. 3101279
4th row349244112053
5th row113760330979
ValueCountFrequency (%)
pc 35
 
6.1%
c.a 18
 
3.1%
ca 9
 
1.6%
a/5 9
 
1.6%
2343 7
 
1.2%
sc/paris 7
 
1.2%
2 6
 
1.0%
ston/o 6
 
1.0%
347082 6
 
1.0%
w./c 5
 
0.9%
Other values (388) 468
81.2%
ValueCountFrequency (%)
pc 26
 
4.6%
c.a 11
 
1.9%
ston/o 10
 
1.8%
2 10
 
1.8%
a/5 7
 
1.2%
ca 6
 
1.1%
soton/oq 6
 
1.1%
1601 5
 
0.9%
ston/o2 5
 
0.9%
w./c 5
 
0.9%
Other values (401) 477
84.0%
2023-09-12T09:35:32.660661image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 366
12.0%
1 315
10.3%
2 313
10.3%
4 240
 
7.9%
7 236
 
7.7%
6 229
 
7.5%
5 195
 
6.4%
0 189
 
6.2%
9 170
 
5.6%
8 141
 
4.6%
Other values (22) 654
21.5%
ValueCountFrequency (%)
3 381
12.4%
1 330
10.8%
2 299
9.8%
7 238
 
7.8%
4 235
 
7.7%
6 222
 
7.3%
0 216
 
7.1%
5 197
 
6.4%
9 164
 
5.4%
8 148
 
4.8%
Other values (25) 631
20.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 2394
78.5%
Uppercase Letter 347
 
11.4%
Other Punctuation 164
 
5.4%
Space Separator 130
 
4.3%
Lowercase Letter 13
 
0.4%
ValueCountFrequency (%)
Decimal Number 2430
79.4%
Uppercase Letter 349
 
11.4%
Other Punctuation 147
 
4.8%
Space Separator 122
 
4.0%
Lowercase Letter 13
 
0.4%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3 366
15.3%
1 315
13.2%
2 313
13.1%
4 240
10.0%
7 236
9.9%
6 229
9.6%
5 195
8.1%
0 189
7.9%
9 170
7.1%
8 141
 
5.9%
ValueCountFrequency (%)
3 381
15.7%
1 330
13.6%
2 299
12.3%
7 238
9.8%
4 235
9.7%
6 222
9.1%
0 216
8.9%
5 197
8.1%
9 164
6.7%
8 148
 
6.1%
Space Separator
ValueCountFrequency (%)
130
100.0%
ValueCountFrequency (%)
122
100.0%
Other Punctuation
ValueCountFrequency (%)
. 117
71.3%
/ 47
28.7%
ValueCountFrequency (%)
. 95
64.6%
/ 52
35.4%
Uppercase Letter
ValueCountFrequency (%)
C 90
25.9%
P 58
16.7%
A 49
14.1%
O 42
12.1%
S 41
11.8%
N 17
 
4.9%
T 15
 
4.3%
I 8
 
2.3%
W 7
 
2.0%
R 6
 
1.7%
Other values (5) 14
 
4.0%
ValueCountFrequency (%)
O 70
20.1%
C 68
19.5%
P 41
11.7%
S 40
11.5%
A 32
9.2%
N 30
8.6%
T 27
 
7.7%
Q 11
 
3.2%
W 9
 
2.6%
E 5
 
1.4%
Other values (6) 16
 
4.6%
Lowercase Letter
ValueCountFrequency (%)
a 4
30.8%
s 3
23.1%
i 3
23.1%
r 3
23.1%
ValueCountFrequency (%)
a 4
30.8%
s 3
23.1%
r 2
15.4%
i 2
15.4%
l 1
 
7.7%
e 1
 
7.7%

Most occurring scripts

ValueCountFrequency (%)
Common 2688
88.2%
Latin 360
 
11.8%
ValueCountFrequency (%)
Common 2699
88.2%
Latin 362
 
11.8%

Most frequent character per script

Common
ValueCountFrequency (%)
3 366
13.6%
1 315
11.7%
2 313
11.6%
4 240
8.9%
7 236
8.8%
6 229
8.5%
5 195
7.3%
0 189
7.0%
9 170
6.3%
8 141
 
5.2%
Other values (3) 294
10.9%
ValueCountFrequency (%)
3 381
14.1%
1 330
12.2%
2 299
11.1%
7 238
8.8%
4 235
8.7%
6 222
8.2%
0 216
8.0%
5 197
7.3%
9 164
6.1%
8 148
 
5.5%
Other values (3) 269
10.0%
Latin
ValueCountFrequency (%)
C 90
25.0%
P 58
16.1%
A 49
13.6%
O 42
11.7%
S 41
11.4%
N 17
 
4.7%
T 15
 
4.2%
I 8
 
2.2%
W 7
 
1.9%
R 6
 
1.7%
Other values (9) 27
 
7.5%
ValueCountFrequency (%)
O 70
19.3%
C 68
18.8%
P 41
11.3%
S 40
11.0%
A 32
8.8%
N 30
8.3%
T 27
 
7.5%
Q 11
 
3.0%
W 9
 
2.5%
E 5
 
1.4%
Other values (12) 29
8.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3048
100.0%
ValueCountFrequency (%)
ASCII 3061
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3 366
12.0%
1 315
10.3%
2 313
10.3%
4 240
 
7.9%
7 236
 
7.7%
6 229
 
7.5%
5 195
 
6.4%
0 189
 
6.2%
9 170
 
5.6%
8 141
 
4.6%
Other values (22) 654
21.5%
ValueCountFrequency (%)
3 381
12.4%
1 330
10.8%
2 299
9.8%
7 238
 
7.8%
4 235
 
7.7%
6 222
 
7.3%
0 216
 
7.1%
5 197
 
6.4%
9 164
 
5.4%
8 148
 
4.8%
Other values (25) 631
20.6%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct185165
Distinct (%)41.5%37.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean32.12828730.313593
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros811
Zeros (%)1.8%2.5%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:35:32.914051image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.226057.0542
Q18.057.8958
median15.372913.5
Q331.27529.125
95-th percentile113.275110.38748
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)23.22521.2292

Descriptive statistics

 Dataset ADataset B
Standard deviation46.49800445.815062
Coefficient of variation (CV)1.44726061.5113703
Kurtosis31.09015533.330745
Mean32.12828730.313593
Median Absolute Deviation (MAD)7.62296.2708
Skewness4.48078184.6638107
Sum14329.21613519.862
Variance2162.06442099.0199
MonotonicityNot monotonicNot monotonic
2023-09-12T09:35:33.151669image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13 22
 
4.9%
7.8958 19
 
4.3%
8.05 18
 
4.0%
10.5 16
 
3.6%
7.75 15
 
3.4%
26 13
 
2.9%
16.1 8
 
1.8%
0 8
 
1.8%
26.55 8
 
1.8%
7.8542 8
 
1.8%
Other values (175) 311
69.7%
ValueCountFrequency (%)
8.05 24
 
5.4%
13 23
 
5.2%
7.75 20
 
4.5%
26 17
 
3.8%
7.8958 15
 
3.4%
10.5 14
 
3.1%
7.925 13
 
2.9%
0 11
 
2.5%
26.55 9
 
2.0%
7.225 8
 
1.8%
Other values (155) 292
65.5%
ValueCountFrequency (%)
0 8
1.8%
6.2375 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 2
 
0.4%
7.0542 1
 
0.2%
7.125 1
 
0.2%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 11
2.5%
4.0125 1
 
0.2%
5 1
 
0.2%
6.2375 1
 
0.2%
6.95 1
 
0.2%
6.975 2
 
0.4%
7.05 5
1.1%
7.0542 2
 
0.4%
7.125 2
 
0.4%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 11
2.5%
4.0125 1
 
0.2%
5 1
 
0.2%
6.2375 1
 
0.2%
6.95 1
 
0.2%
6.975 2
 
0.4%
7.05 5
1.1%
7.0542 2
 
0.4%
7.125 2
 
0.4%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 8
1.8%
6.2375 1
 
0.2%
6.4958 1
 
0.2%
6.75 1
 
0.2%
6.975 1
 
0.2%
7.0458 1
 
0.2%
7.05 2
 
0.4%
7.0542 1
 
0.2%
7.125 1
 
0.2%
7.1417 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8589
Distinct (%)85.9%89.0%
Missing347346
Missing (%)77.8%77.6%
Memory size7.0 KiB7.0 KiB
2023-09-12T09:35:33.563800image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1511
Median length33
Mean length3.42424243.45
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters339345
Distinct characters1918
Distinct categories33 ?
Distinct scripts22 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7179 ?
Unique (%)71.7%79.0%

Sample

 Dataset ADataset B
1st rowC52B94
2nd rowB96 B98A19
3rd rowC91B42
4th rowD20G6
5th rowB96 B98E46
ValueCountFrequency (%)
c52 2
 
1.8%
e24 2
 
1.8%
e67 2
 
1.8%
f 2
 
1.8%
c124 2
 
1.8%
d20 2
 
1.8%
b98 2
 
1.8%
b28 2
 
1.8%
e121 2
 
1.8%
b96 2
 
1.8%
Other values (84) 91
82.0%
ValueCountFrequency (%)
c23 3
 
2.6%
c27 3
 
2.6%
c25 3
 
2.6%
b96 2
 
1.8%
c125 2
 
1.8%
g6 2
 
1.8%
c52 2
 
1.8%
b98 2
 
1.8%
f 2
 
1.8%
d35 2
 
1.8%
Other values (87) 91
79.8%
2023-09-12T09:35:34.151196image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2 34
 
10.0%
1 30
 
8.8%
B 30
 
8.8%
C 28
 
8.3%
6 25
 
7.4%
3 22
 
6.5%
0 21
 
6.2%
5 20
 
5.9%
8 20
 
5.9%
D 18
 
5.3%
Other values (9) 91
26.8%
ValueCountFrequency (%)
2 41
11.9%
C 37
10.7%
3 30
 
8.7%
B 28
 
8.1%
5 24
 
7.0%
6 23
 
6.7%
1 21
 
6.1%
D 19
 
5.5%
9 18
 
5.2%
4 17
 
4.9%
Other values (8) 87
25.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 216
63.7%
Uppercase Letter 111
32.7%
Space Separator 12
 
3.5%
ValueCountFrequency (%)
Decimal Number 217
62.9%
Uppercase Letter 114
33.0%
Space Separator 14
 
4.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2 34
15.7%
1 30
13.9%
6 25
11.6%
3 22
10.2%
0 21
9.7%
5 20
9.3%
8 20
9.3%
4 16
7.4%
9 15
6.9%
7 13
 
6.0%
ValueCountFrequency (%)
2 41
18.9%
3 30
13.8%
5 24
11.1%
6 23
10.6%
1 21
9.7%
9 18
8.3%
4 17
7.8%
7 16
 
7.4%
0 14
 
6.5%
8 13
 
6.0%
Uppercase Letter
ValueCountFrequency (%)
B 30
27.0%
C 28
25.2%
D 18
16.2%
E 17
15.3%
A 8
 
7.2%
F 6
 
5.4%
G 3
 
2.7%
T 1
 
0.9%
ValueCountFrequency (%)
C 37
32.5%
B 28
24.6%
D 19
16.7%
E 13
 
11.4%
A 8
 
7.0%
F 5
 
4.4%
G 4
 
3.5%
Space Separator
ValueCountFrequency (%)
12
100.0%
ValueCountFrequency (%)
14
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 228
67.3%
Latin 111
32.7%
ValueCountFrequency (%)
Common 231
67.0%
Latin 114
33.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2 34
14.9%
1 30
13.2%
6 25
11.0%
3 22
9.6%
0 21
9.2%
5 20
8.8%
8 20
8.8%
4 16
7.0%
9 15
6.6%
7 13
 
5.7%
ValueCountFrequency (%)
2 41
17.7%
3 30
13.0%
5 24
10.4%
6 23
10.0%
1 21
9.1%
9 18
7.8%
4 17
7.4%
7 16
 
6.9%
0 14
 
6.1%
14
 
6.1%
Latin
ValueCountFrequency (%)
B 30
27.0%
C 28
25.2%
D 18
16.2%
E 17
15.3%
A 8
 
7.2%
F 6
 
5.4%
G 3
 
2.7%
T 1
 
0.9%
ValueCountFrequency (%)
C 37
32.5%
B 28
24.6%
D 19
16.7%
E 13
 
11.4%
A 8
 
7.0%
F 5
 
4.4%
G 4
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 339
100.0%
ValueCountFrequency (%)
ASCII 345
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2 34
 
10.0%
1 30
 
8.8%
B 30
 
8.8%
C 28
 
8.3%
6 25
 
7.4%
3 22
 
6.5%
0 21
 
6.2%
5 20
 
5.9%
8 20
 
5.9%
D 18
 
5.3%
Other values (9) 91
26.8%
ValueCountFrequency (%)
2 41
11.9%
C 37
10.7%
3 30
 
8.7%
B 28
 
8.1%
5 24
 
7.0%
6 23
 
6.7%
1 21
 
6.1%
D 19
 
5.5%
9 18
 
5.2%
4 17
 
4.9%
Other values (8) 87
25.2%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing21
Missing (%)0.4%0.2%
Memory size7.0 KiB7.0 KiB
S
323 
C
89 
Q
 
32
S
340 
C
64 
Q
41 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters444445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowSS
3rd rowSS
4th rowSS
5th rowSQ

Common Values

ValueCountFrequency (%)
S 323
72.4%
C 89
 
20.0%
Q 32
 
7.2%
(Missing) 2
 
0.4%
ValueCountFrequency (%)
S 340
76.2%
C 64
 
14.3%
Q 41
 
9.2%
(Missing) 1
 
0.2%

Length

2023-09-12T09:35:34.385583image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2023-09-12T09:35:34.513799image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:34.636836image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
ValueCountFrequency (%)
s 323
72.7%
c 89
 
20.0%
q 32
 
7.2%
ValueCountFrequency (%)
s 340
76.4%
c 64
 
14.4%
q 41
 
9.2%

Most occurring characters

ValueCountFrequency (%)
S 323
72.7%
C 89
 
20.0%
Q 32
 
7.2%
ValueCountFrequency (%)
S 340
76.4%
C 64
 
14.4%
Q 41
 
9.2%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter 444
100.0%
ValueCountFrequency (%)
Uppercase Letter 445
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
S 323
72.7%
C 89
 
20.0%
Q 32
 
7.2%
ValueCountFrequency (%)
S 340
76.4%
C 64
 
14.4%
Q 41
 
9.2%

Most occurring scripts

ValueCountFrequency (%)
Latin 444
100.0%
ValueCountFrequency (%)
Latin 445
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
S 323
72.7%
C 89
 
20.0%
Q 32
 
7.2%
ValueCountFrequency (%)
S 340
76.4%
C 64
 
14.4%
Q 41
 
9.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII 444
100.0%
ValueCountFrequency (%)
ASCII 445
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
S 323
72.7%
C 89
 
20.0%
Q 32
 
7.2%
ValueCountFrequency (%)
S 340
76.4%
C 64
 
14.4%
Q 41
 
9.2%

Interactions

Dataset A

2023-09-12T09:35:22.510273image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:26.252898image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:20.110254image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:23.913250image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:20.676600image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:24.470970image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:21.255873image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:25.077157image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:21.941931image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:25.670078image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:22.613272image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:26.358911image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:20.220261image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:24.013431image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:20.785322image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:24.581687image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:21.374933image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:25.187430image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:22.047959image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:25.780197image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:22.732296image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:26.593292image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:20.336717image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:24.129050image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:20.906289image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:24.717265image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:21.487482image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:25.305405image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:22.164593image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:25.901549image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:22.851690image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:26.716008image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:20.459936image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:24.250675image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:21.026255image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:24.839335image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:21.703681image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:25.429074image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:22.292276image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:26.031350image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:22.961290image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:26.832545image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:20.566475image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:24.361994image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:21.140314image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:24.962372image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:21.820012image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:25.550869image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

2023-09-12T09:35:22.399830image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:26.144528image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Correlations

Dataset A

2023-09-12T09:35:34.739354image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset B

2023-09-12T09:35:34.903016image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/

Dataset A

PassengerIdAgeSibSpParchFareSurvivedPclassSexEmbarked
PassengerId1.0000.038-0.0020.0010.0370.1570.0000.0000.000
Age0.0381.000-0.223-0.2690.1680.1010.3120.1150.104
SibSp-0.002-0.2231.0000.4690.4140.1960.1250.1840.116
Parch0.001-0.2690.4691.0000.4240.1090.0000.2050.000
Fare0.0370.1680.4140.4241.0000.2370.4330.1440.222
Survived0.1570.1010.1960.1090.2371.0000.3050.5060.169
Pclass0.0000.3120.1250.0000.4330.3051.0000.0570.302
Sex0.0000.1150.1840.2050.1440.5060.0571.0000.111
Embarked0.0000.1040.1160.0000.2220.1690.3020.1111.000

Dataset B

PassengerIdAgeSibSpParchFareSurvivedPclassSexEmbarked
PassengerId1.0000.075-0.009-0.0270.0230.0640.0000.0000.000
Age0.0751.000-0.149-0.1660.1890.1510.2710.1220.035
SibSp-0.009-0.1491.0000.5190.4390.1500.1030.2400.106
Parch-0.027-0.1660.5191.0000.4120.1400.0420.3000.000
Fare0.0230.1890.4390.4121.0000.2630.4310.2080.159
Survived0.0640.1510.1500.1400.2631.0000.3070.5430.139
Pclass0.0000.2710.1030.0420.4310.3071.0000.1180.267
Sex0.0000.1220.2400.3000.2080.5430.1181.0000.100
Embarked0.0000.0350.1060.0000.1590.1390.2670.1001.000

Missing values

Dataset A

2023-09-12T09:35:23.130326image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2023-09-12T09:35:26.998612image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2023-09-12T09:35:23.347969image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2023-09-12T09:35:27.222426image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2023-09-12T09:35:23.518912image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2023-09-12T09:35:27.371971image/svg+xmlMatplotlib v3.7.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
555611Woolner, Mr. HughmaleNaN001994735.5000C52S
20220303Johanson, Mr. Jakob Alfredmale34.00031012646.4958NaNS
40740812Richards, Master. William Rowemale3.0112910618.7500NaNS
79779813Osman, Mrs. Marafemale31.0003492448.6833NaNS
39039111Carter, Mr. William Ernestmale36.012113760120.0000B96 B98S
31331403Hendekovic, Mr. Ignjacmale28.0003492437.8958NaNS
16416503Panula, Master. Eino Viljamimale1.041310129539.6875NaNS
88388402Banfield, Mr. Frederick Jamesmale28.000C.A./SOTON 3406810.5000NaNS
515203Nosworthy, Mr. Richard Catermale21.000A/4. 398867.8000NaNS
33233301Graham, Mr. George Edwardmale38.001PC 17582153.4625C91S

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
26326401Harrison, Mr. Williammale40.0001120590.0000B94S
28428501Smith, Mr. Richard WilliammaleNaN0011305626.0000A19S
14214313Hakkarainen, Mrs. Pekka Pietari (Elin Matilda Dolck)female24.010STON/O2. 310127915.8500NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
55255303O'Brien, Mr. TimothymaleNaN003309797.8292NaNQ
68868903Fischer, Mr. Eberhard Thelandermale18.0003500367.7958NaNS
31531613Nilsson, Miss. Helmina Josefinafemale26.0003474707.8542NaNS
76276313Barah, Mr. Hanna Assimale20.00026637.2292NaNC
101113Sandstrom, Miss. Marguerite Rutfemale4.011PP 954916.7000G6S
63563612Davis, Miss. Maryfemale28.00023766813.0000NaNS

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
39339411Newell, Miss. Marjoriefemale23.01035273113.2750D36C
71871903McEvoy, Mr. MichaelmaleNaN003656815.5000NaNQ
42442503Rosblom, Mr. Viktor Richardmale18.01137012920.2125NaNS
13613711Newsom, Miss. Helen Monypenyfemale19.0021175226.2833D47S
64965013Stanley, Miss. Amy Zillah Elsiefemale23.000CA. 23147.5500NaNS
17217313Johnson, Miss. Eleanor Ileenfemale1.01134774211.1333NaNS
32332412Caldwell, Mrs. Albert Francis (Sylvia Mae Harbaugh)female22.01124873829.0000NaNS
21821911Bazzani, Miss. Albinafemale32.0001181376.2917D15C
76776803Mangan, Miss. Maryfemale30.5003648507.7500NaNQ
86486502Gill, Mr. John Williammale24.00023386613.0000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
42342403Danbom, Mrs. Ernst Gilbert (Anna Sigrid Maria Brogren)female28.01134708014.4000NaNS
76376411Carter, Mrs. William Ernest (Lucile Polk)female36.012113760120.0000B96 B98S
20120203Sage, Mr. FrederickmaleNaN82CA. 234369.5500NaNS
81481503Tomlin, Mr. Ernest Portagemale30.5003644998.0500NaNS
70370403Gallagher, Mr. Martinmale25.000368647.7417NaNQ
757603Moen, Mr. Sigurd Hansenmale25.0003481237.6500F G73S
27127213Tornquist, Mr. William Henrymale25.000LINE0.0000NaNS
71871903McEvoy, Mr. MichaelmaleNaN003656815.5000NaNQ
34935003Dimic, Mr. Jovanmale42.0003150888.6625NaNS
51251311McGough, Mr. James Robertmale36.000PC 1747326.2875E25S

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.